Following the Xenium Data Analysis tutorial from:¶

https://squidpy.readthedocs.io/en/stable/notebooks/tutorials/tutorial_xenium.html#¶

Creating a micromamba environment to install the necessary packages for the tutorial.¶

In [ ]:
# !micromamba create -y -n xenium_analysis
# !micromamba activate xenium_analysis
# !pip install numpy
# !pip install pandas
# !pip install matplotlib
# !pip install seaborn
# !pip install scanpy
# !pip install squidpy

Import the necessary packages.¶

In [ ]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

import scanpy as sc
import squidpy as sq

Create proper folder structure and load in the data.¶

In [ ]:
# !mkdir tutorial_data
# !mkdir tutorial_data/xenium_data
# !curl -o tutorial_data/xenium_data/ https://cf.10xgenomics.com/samples/xenium/preview/Xenium_FFPE_Human_Breast_Cancer_Rep1/Xenium_FFPE_Human_Breast_Cancer_Rep1_cell_feature_matrix.h5
# !curl -o tutorial_data/xenium_data/Xenium_FFPE_Human_Breast_Cancer_Rep1_cells.csv.gz https://cf.10xgenomics.com/samples/xenium/preview/Xenium_FFPE_Human_Breast_Cancer_Rep1/Xenium_FFPE_Human_Breast_Cancer_Rep1_cells.csv.gz
In [ ]:
adata = sc.read_10x_h5(
    filename="tutorial_data/xenium_data/Xenium_FFPE_Human_Breast_Cancer_Rep1_cell_feature_matrix.h5"
)
In [ ]:
df = pd.read_csv(
    "tutorial_data/xenium_data/Xenium_FFPE_Human_Breast_Cancer_Rep1_cells.csv"
)
In [ ]:
df.set_index(adata.obs_names, inplace=True)
adata.obs = df.copy()
In [ ]:
adata.obsm["spatial"] = adata.obs[["x_centroid", "y_centroid"]].copy().to_numpy()

Preview the data.¶

In [ ]:
adata.obs
Out[ ]:
cell_id x_centroid y_centroid transcript_counts control_probe_counts control_codeword_counts total_counts cell_area nucleus_area
1 1 377.663005 843.541888 154 0 0 154 110.361875 45.562656
2 2 382.078658 858.944818 64 0 0 64 87.919219 24.248906
3 3 319.839529 869.196542 57 0 0 57 52.561875 23.526406
4 4 259.304707 851.797949 120 0 0 120 75.230312 35.176719
5 5 370.576291 865.193024 120 0 0 120 180.218594 34.499375
... ... ... ... ... ... ... ... ... ...
167778 167778 7455.404785 5115.021094 238 1 0 239 219.956094 61.412500
167779 167779 7483.771045 5111.720703 80 0 0 80 38.427969 25.964844
167780 167780 7470.119580 5119.350366 406 0 0 406 287.690469 86.158125
167781 167781 7477.704004 5128.963086 120 0 0 120 235.670469 25.016563
167782 167782 7489.376562 5123.402393 393 0 0 393 269.447344 111.445625

167782 rows × 9 columns

Calculate quality control metrics¶

QC anndata.AnnData with scanpy.pp.calculate_qc_metrics¶

In [ ]:
sc.pp.calculate_qc_metrics(adata, percent_top=(10, 20, 50, 150), inplace=True)

Calculate % of control probes and codewords from adata.obs¶

In [ ]:
cprobes = (
    adata.obs["control_probe_counts"].sum() / adata.obs["total_counts"].sum() * 100
)
cwords = (
    adata.obs["control_codeword_counts"].sum() / adata.obs["total_counts"].sum() * 100
)
print(f"Negative DNA probe count % : {cprobes}")
print(f"Negative decoding count % : {cwords}")
Negative DNA probe count % : 0.08700852649698224
Negative decoding count % : 0.005482476054877954

Plot the distribution of total transcripts per cell, area of segmented cells, and the ratio of nuclei area to their cells¶

In [ ]:
fig, axs = plt.subplots(1, 4, figsize=(15, 4))

axs[0].set_title("Total transcripts per cell")
sns.histplot(
    adata.obs["total_counts"],
    kde=False,
    ax=axs[0],
)

axs[1].set_title("Unique transcripts per cell")
sns.histplot(
    adata.obs["n_genes_by_counts"],
    kde=False,
    ax=axs[1],
)


axs[2].set_title("Area of segmented cells")
sns.histplot(
    adata.obs["cell_area"],
    kde=False,
    ax=axs[2],
)

axs[3].set_title("Nucleus ratio")
sns.histplot(
    adata.obs["nucleus_area"] / adata.obs["cell_area"],
    kde=False,
    ax=axs[3],
)
Out[ ]:
<Axes: title={'center': 'Nucleus ratio'}, ylabel='Count'>
No description has been provided for this image

Filter out the cells based on minimum number of counts and filter out genes based on minimum number of cells¶

In [ ]:
sc.pp.filter_cells(adata, min_counts=10)
sc.pp.filter_genes(adata, min_cells=5)

Other filter criteria could be cell area, DAPI signal, or minimum number of unique transcripts¶

Normalize the cell counts with sc.pp.normalize_total¶

Logarithmize, do PCA, compute a neighborhood graph of observations¶

Use sc.tl.umap to embed the neighborhood graph of the data and cluster the cells into subgroups with sc.tl.leiden(adata)¶

In [ ]:
adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata)
c:\Users\andrew\AppData\Local\Programs\Python\Python312\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
C:\Users\andrew\AppData\Local\Temp\ipykernel_19940\4193936636.py:7: FutureWarning: In the future, the default backend for leiden will be igraph instead of leidenalg.

 To achieve the future defaults please pass: flavor="igraph" and n_iterations=2.  directed must also be False to work with igraph's implementation.
  sc.tl.leiden(adata)

Visualize annotation on UMAP and spatial coordinates¶

In [ ]:
sc.pl.umap(
    adata,
    color=[
        "total_counts",
        "n_genes_by_counts",
        "leiden",
    ],
    wspace=0.4,
)
No description has been provided for this image
In [ ]:
sq.pl.spatial_scatter(
    adata,
    library_id="spatial",
    shape=None,
    color=[
        "leiden",
    ],
    wspace=0.4,
)
c:\Users\andrew\AppData\Local\Programs\Python\Python312\Lib\site-packages\squidpy\pl\_spatial_utils.py:946: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = scatter(
No description has been provided for this image

Compute spatial statistics¶

Building the spatial neighbors graph¶

In [ ]:
sq.gr.spatial_neighbors(adata, coord_type="generic", delaunay=True)

Compute centrality scores¶

In [ ]:
sq.gr.centrality_scores(adata, cluster_key="leiden")

Visualize the results¶

In [ ]:
sq.pl.centrality_scores(adata, cluster_key="leiden", figsize=(16, 5))
No description has been provided for this image

Compute co-occurence probability¶

In [ ]:
adata_subsample = sc.pp.subsample(adata, fraction=0.5, copy=True)
In [ ]:
sq.gr.co_occurrence(
    adata_subsample,
    cluster_key="leiden",
)
sq.pl.co_occurrence(
    adata_subsample,
    cluster_key="leiden",
    clusters="12",
    figsize=(10, 10),
)
sq.pl.spatial_scatter(
    adata_subsample,
    color="leiden",
    shape=None,
    size=2,
)
WARNING: `n_splits` was automatically set to `41` to prevent `82039x82039` distance matrix from being created
100%|██████████| 861/861 [13:57<00:00,  1.03/s]
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
c:\Users\andrew\AppData\Local\Programs\Python\Python312\Lib\site-packages\squidpy\pl\_spatial_utils.py:946: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = scatter(
No description has been provided for this image
No description has been provided for this image

Neighbors enrichment analysis¶

In [ ]:
sq.gr.nhood_enrichment(adata, cluster_key="leiden")
100%|██████████| 1000/1000 [00:14<00:00, 68.02/s]

Visualize the results¶

In [ ]:
fig, ax = plt.subplots(1, 2, figsize=(13, 7))
sq.pl.nhood_enrichment(
    adata,
    cluster_key="leiden",
    figsize=(8, 8),
    title="Neighborhood enrichment adata",
    ax=ax[0],
)
sq.pl.spatial_scatter(adata_subsample, color="leiden", shape=None, size=2, ax=ax[1])
WARNING: Please specify a valid `library_id` or set it permanently in `adata.uns['spatial']`
c:\Users\andrew\AppData\Local\Programs\Python\Python312\Lib\site-packages\squidpy\pl\_spatial_utils.py:946: UserWarning: No data for colormapping provided via 'c'. Parameters 'cmap', 'norm' will be ignored
  _cax = scatter(
No description has been provided for this image

Compute Moran's I score¶

In [ ]:
sq.gr.spatial_neighbors(adata_subsample, coord_type="generic", delaunay=True)
sq.gr.spatial_autocorr(
    adata_subsample,
    mode="moran",
    n_perms=100,
    n_jobs=1,
)
adata_subsample.uns["moranI"].head(10)
100%|██████████| 100/100 [00:56<00:00,  1.77/s]
Out[ ]:
I pval_norm var_norm pval_z_sim pval_sim var_sim pval_norm_fdr_bh pval_z_sim_fdr_bh pval_sim_fdr_bh
KRT7 0.696668 0.0 0.000004 0.0 0.009901 0.000010 0.0 0.0 0.009997
SCD 0.671975 0.0 0.000004 0.0 0.009901 0.000008 0.0 0.0 0.009997
FOXA1 0.659014 0.0 0.000004 0.0 0.009901 0.000010 0.0 0.0 0.009997
FASN 0.653400 0.0 0.000004 0.0 0.009901 0.000009 0.0 0.0 0.009997
EPCAM 0.641954 0.0 0.000004 0.0 0.009901 0.000008 0.0 0.0 0.009997
TACSTD2 0.638978 0.0 0.000004 0.0 0.009901 0.000008 0.0 0.0 0.009997
CEACAM6 0.631554 0.0 0.000004 0.0 0.009901 0.000009 0.0 0.0 0.009997
ERBB2 0.628083 0.0 0.000004 0.0 0.009901 0.000010 0.0 0.0 0.009997
KRT8 0.619555 0.0 0.000004 0.0 0.009901 0.000009 0.0 0.0 0.009997
LUM 0.593700 0.0 0.000004 0.0 0.009901 0.000007 0.0 0.0 0.009997

Visualize the results¶

In [ ]:
sq.pl.spatial_scatter(
    adata_subsample,
    library_id="spatial",
    color=[
        "KRT7",
        "FOXA1",
    ],
    shape=None,
    size=2,
    img=False,
)
No description has been provided for this image